Skip to content

feature: langfuse thinking 及 文本edit的问题修复( #371); 省略 diff 以减少内存峰值#376

Merged
claude-code-best merged 6 commits intomainfrom
feature/langfuse-and-some-improve
Apr 27, 2026
Merged

feature: langfuse thinking 及 文本edit的问题修复( #371); 省略 diff 以减少内存峰值#376
claude-code-best merged 6 commits intomainfrom
feature/langfuse-and-some-improve

Conversation

@claude-code-best
Copy link
Copy Markdown
Owner

@claude-code-best claude-code-best commented Apr 27, 2026

Summary by CodeRabbit

  • New Features

    • Improved file-content matching with tab/space normalization and better CJK/multiline handling.
    • Conditional diff-collapsing for older messages based on message position.
    • Enhanced analytics/observability to include thinking budget tokens when extended thinking is enabled.
  • Tests

    • Added comprehensive tests verifying matching behavior across indentation, CJK, and multiline scenarios.
  • Documentation

    • README guidance for resolving certain install/update issues.

claude-code-best and others added 5 commits April 27, 2026 14:52
在 recordLLMObservation 中添加 thinking 配置(type/budgetTokens),
所有 provider(claude/gemini/openai)及 tokenEstimation、sideQuery
调用处同步传递 thinking 信息,便于 Langfuse 面板观察 thinking 使用情况。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Langfuse 追踪直接传递整个 thinking 对象(含 type 和 budget_tokens),
Analytics 日志同步补充 thinkingBudgetTokens 字段,logAPIQuery 改为
接收 ThinkingConfig 类型参数。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Read 工具输出将 Tab 渲染为空格,用户复制后 Edit 工具无法匹配。
在 findActualString 中增加 Tab→空格规范化回退匹配,并精确映射回原始文件位置。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mintlify
Copy link
Copy Markdown

mintlify Bot commented Apr 27, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
ccb-863780bf 🟢 Ready View Preview Apr 27, 2026, 8:55 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9f14193-70d1-43b0-833e-b0176d62d2d8

📥 Commits

Reviewing files that changed from the base of the PR and between 4cbef96 and 0fcdcd6.

📒 Files selected for processing (1)
  • README.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

📝 Walkthrough

Walkthrough

Enhances file string matching with tab-to-space fuzzy normalization and tests, threads a shouldCollapseDiffs prop to enable collapsing older tool-result diffs, and extends Langfuse tracing/logging to carry “thinking” configuration and budget tokens across providers and analytics.

Changes

Cohort / File(s) Summary
FileEditTool String Matching & Tests
packages/builtin-tools/src/tools/FileEditTool/utils.ts, packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts
findActualString gains fuzzy matching: normalize tabs→spaces, search normalized content, and map normalized match ranges back to original offsets. Adds tests for tab/space indentation differences, CJK handling, and combined multiline cases.
Message Diff Collapse Prop Threading
src/components/Message.tsx, src/components/MessageRow.tsx, src/components/Messages.tsx, src/components/messages/UserToolResultMessage/UserToolResultMessage.tsx, src/components/messages/UserToolResultMessage/UserToolSuccessMessage.tsx
Adds optional shouldCollapseDiffs prop on message components; Messages computes per-row value and threads it down to UserToolSuccessMessage, which can force 'condensed' style when not verbose.
Langfuse Thinking Metadata Integration
src/services/langfuse/tracing.ts, src/services/api/claude.ts, src/services/api/gemini/index.ts, src/services/api/openai/index.ts, src/services/api/logging.ts, src/services/tokenEstimation.ts, src/utils/sideQuery.ts
recordLLMObservation accepts optional thinking payload and includes it in Langfuse metadata. API handlers and logging now pass thinkingConfig and thinkingBudgetTokens when applicable; token estimation and sideQuery include thinking budget when present.
Analytics Event Enhancement
src/main.tsx
tengu_init analytics event conditionally includes thinkingBudgetTokens when thinking is "enabled".
Docs
README.md
Adds guidance for reinstalling/uninstalling when global npm install/update fails for the CLI package.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as API Handler (Claude/Gemini/OpenAI)
  participant Logger as logAPIQuery
  participant Tracer as recordLLMObservation
  participant Langfuse as Langfuse

  Client->>API: send request (includes thinkingConfig)
  API->>Logger: logAPIQuery(..., thinkingConfig)
  Logger->>Tracer: recordLLMObservation(rootSpan, { provider, model, thinking: thinkingConfig })
  Tracer->>Langfuse: send generation span metadata (includes thinking/budgetTokens)
  Langfuse-->>Tracer: ack
  Tracer-->>Logger: done
  Logger-->>API: logged
  API-->>Client: response (and possible stream fallback events with thinkingBudgetTokens)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

v6

Suggested reviewers

  • KonghaYao

Poem

🐰
I hopped through tabs and spaces bright,
Found strings where indentations fight,
I threaded props through message trees,
And whispered thinking to Langfuse's breeze,
A little rabbit cheers tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: Langfuse thinking integration, text edit fixes, and diff omission for memory optimization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/langfuse-and-some-improve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/components/Messages.tsx (1)

819-847: ⚠️ Potential issue | 🟡 Minor

Add shouldCollapseDiffs to memo comparators or clarify intent.

shouldCollapseDiffs is not checked in areMessageRowPropsEqual (MessageRow.tsx:319) or areMessagePropsEqual (Message.tsx:507). When a new message is appended, shouldCollapseDiffs flips from falsetrue for the previously-latest tool_result row. However, both memoized comparators return true (skip re-render) for static/resolved messages, so the component never re-renders to collapse the diff despite the prop change.

If the intent is "collapse only at first paint," add a clarifying code comment. If the intent is "actively transition older diffs to condensed as the conversation grows," add shouldCollapseDiffs to at least one of the comparators (likely areMessageRowPropsEqual when the message contains a tool_result block).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/Messages.tsx` around lines 819 - 847, The prop
shouldCollapseDiffs is changing but not included in the memo comparators, so add
it to the equality check (or document intent). Update areMessageRowPropsEqual
(in MessageRow.tsx) to compare prev.shouldCollapseDiffs !==
next.shouldCollapseDiffs and return false when it differs (especially for rows
with tool_result content), or if the desired behavior is to only apply collapse
on first render, add a clear comment near the shouldCollapseDiffs calculation in
Messages.tsx and keep comparators unchanged; ensure areMessagePropsEqual
(Message.tsx) is also updated if Message consumes shouldCollapseDiffs directly.
🧹 Nitpick comments (5)
src/components/Messages.tsx (1)

817-822: Hoist DIFF_COLLAPSE_DISTANCE to module scope and clarify the comment.

DIFF_COLLAPSE_DISTANCE is a tuning knob declared inside renderMessageRow (re-created on every row), and with the current value 0 the comment "beyond the latest N messages" reads as if multiple recent messages stay verbose — only the single last one does. Hoisting it next to MAX_MESSAGES_WITHOUT_VIRTUALIZATION / MESSAGE_CAP_STEP makes it discoverable and tunable.

♻️ Suggested refactor
 const MAX_MESSAGES_WITHOUT_VIRTUALIZATION = 200
 const MESSAGE_CAP_STEP = 50
+
+// Tool-result diffs are collapsed for every message except the last
+// DIFF_COLLAPSE_DISTANCE messages (verbose / ctrl+o overrides). Set to 0
+// so only the most recent tool_result keeps its full diff — older diffs
+// drop to 'condensed' to reduce memory peak on large transcripts.
+const DIFF_COLLAPSE_DISTANCE = 0
-    // Collapse diffs for messages beyond the latest N messages.
-    // verbose (ctrl+o) overrides and always shows full diffs.
-    const DIFF_COLLAPSE_DISTANCE = 0
     const shouldCollapseDiffs =
       renderableMessages.length - 1 - index > DIFF_COLLAPSE_DISTANCE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/Messages.tsx` around lines 817 - 822, Hoist the
DIFF_COLLAPSE_DISTANCE constant out of renderMessageRow to module scope next to
MAX_MESSAGES_WITHOUT_VIRTUALIZATION and MESSAGE_CAP_STEP so it isn't re-created
per row and is easier to tune; update the inline comment where
DIFF_COLLAPSE_DISTANCE is used to accurately describe behavior (e.g. "Collapse
diffs for messages beyond the latest N messages; when N=0 only the most recent
message remains expanded") and keep the existing usage of shouldCollapseDiffs =
renderableMessages.length - 1 - index > DIFF_COLLAPSE_DISTANCE unchanged so the
logic still references the hoisted constant.
src/services/api/claude.ts (2)

2554-2556: Optional: extract the duplicated thinkingBudgetTokens spread.

The same conditional appears three times (disabled-fallback path, normal-fallback path, 404-fallback path). A small helper would tidy this up, but it's purely cosmetic.

const thinkingAnalyticsFields =
  thinkingConfig.type === 'enabled'
    ? { thinkingBudgetTokens: thinkingConfig.budgetTokens }
    : {}

…then ...thinkingAnalyticsFields at the three sites.

Also applies to: 2589-2591, 2708-2710

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/claude.ts` around lines 2554 - 2556, Extract the duplicated
conditional spread for thinkingBudgetTokens into a small helper object (e.g.,
create thinkingAnalyticsFields) so you only compute it once from thinkingConfig:
set thinkingAnalyticsFields = thinkingConfig.type === 'enabled' ? {
thinkingBudgetTokens: thinkingConfig.budgetTokens } : {} and then replace the
three occurrences of ...(thinkingConfig.type === 'enabled' && {
thinkingBudgetTokens: thinkingConfig.budgetTokens }) (the disabled-fallback,
normal-fallback and 404-fallback sites) with ...thinkingAnalyticsFields; keep
the symbol names thinkingConfig and thinkingBudgetTokens unchanged so the
refactor is minimal and local.

1782-1793: Inconsistent field naming with peer providers — consider normalizing to budgetTokens for consistency.

langfuseThinking captures queryParams.thinking directly, which uses budget_tokens (snake_case from the Anthropic SDK). However, recordLLMObservation accepts both budget_tokens and budgetTokens (per its type signature and JSDoc at src/services/langfuse/tracing.ts lines 81–90), so this will work without issue.

That said, every other provider normalizes to camelCase before passing to recordLLMObservation:

  • src/services/api/gemini/index.ts (lines 196-204): { type, budgetTokens }
  • src/utils/sideQuery.ts (lines 297-302): { type, budgetTokens: thinkingConfig.budget_tokens }
  • src/services/tokenEstimation.ts (line 357): { type: 'enabled', budgetTokens }

Aligning claude.ts to this pattern improves consistency and avoids relying on the permissive dual-format acceptance.

Optional: Normalize to match peer providers
-  let langfuseThinking: BetaMessageStreamParams['thinking'] | undefined
+  let langfuseThinking:
+    | { type: 'enabled'; budgetTokens: number }
+    | { type: 'adaptive' }
+    | undefined
   {
     const queryParams = paramsFromContext({
       model: options.model,
       thinkingConfig,
     })
     const logMessagesLength = queryParams.messages.length
     const logBetas = useBetas ? (queryParams.betas ?? []) : []
     const logEffortValue = queryParams.output_config?.effort
-    if (queryParams.thinking && queryParams.thinking.type !== 'disabled') {
-      langfuseThinking = queryParams.thinking
-    }
+    if (queryParams.thinking?.type === 'enabled') {
+      langfuseThinking = {
+        type: 'enabled',
+        budgetTokens: queryParams.thinking.budget_tokens,
+      }
+    } else if (queryParams.thinking?.type === 'adaptive') {
+      langfuseThinking = { type: 'adaptive' }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/claude.ts` around lines 1782 - 1793, langfuseThinking is set
directly from queryParams.thinking which uses snake_case (budget_tokens), but
other providers normalize to camelCase; update the block that assigns
langfuseThinking (in the function that calls paramsFromContext and uses
thinkingConfig) to map thinking.budget_tokens to thinking.budgetTokens (i.e.,
produce { type, budgetTokens } or equivalent) before passing into
recordLLMObservation so the Claude path matches the camelCase shape used by
gemini/index.ts, sideQuery.ts and tokenEstimation.ts and avoids relying on
dual-format acceptance.
packages/builtin-tools/src/tools/FileEditTool/utils.ts (1)

121-128: Combined branch re-normalizes whitespace over already-quote-normalized strings — harmless but redundant.

Lines 122-123 call normalizeWhitespace(normalizedFile) and normalizeWhitespace(normalizedSearch), where both inputs were produced earlier (lines 100-101). Quote normalization doesn't introduce or remove tabs, so combinedFile === wsNormalizedFile whenever no curly quotes exist and is a strict superset transformation otherwise. You can compute each normalization once and reuse, e.g. const combinedFile = normalizeQuotes(wsNormalizedFile) (and likewise for the search), avoiding one extra full-string traversal per call. Minor on small files; matters for very large ones where this is a hot path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 121 -
128, The combined-branch is redundantly calling
normalizeWhitespace(normalizedFile) and normalizeWhitespace(normalizedSearch)
where normalizedFile/normalizedSearch were produced earlier; instead reuse the
already-whitespace-normalized values (wsNormalizedFile, wsNormalizedSearch) and
apply quote normalization once: replace combinedFile =
normalizeWhitespace(normalizedFile) with combinedFile =
normalizeQuotes(wsNormalizedFile) and combinedSearch =
normalizeQuotes(wsNormalizedSearch), keeping the subsequent combinedIndex and
mapNormalizedMatchBackToFile usage unchanged.
packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts (1)

109-187: Coverage gap: ratio-fallback edge case in mapNormalizedMatchBackToFile is not exercised.

Every test where the match extends to the end of fileContent happens to also match the entire file starting at offset 0, which makes the origEnd === -1 ratio-heuristic in mapNormalizedMatchBackToFile resolve to the correct value by coincidence (ratio * normalizedLength === fileContent.length). A case like a partial trailing match in a file whose tab density is non-uniform (e.g. "\t\t\t\t\t\t\t\t\t\there" searched with " here") would expose the truncation bug noted on utils.ts. Consider adding such a case once the underlying mapping is hardened.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts` around
lines 109 - 187, Add a unit test that reproduces the ratio-fallback edge-case in
mapNormalizedMatchBackToFile by using a fileContent with highly non-uniform tab
density (e.g. many leading tabs then text) and a search string with spaces for
the same trailing text (example: fileContent = "\t\t\t\t\t\t\t\t\t\there" and
searchSpaces = "    here"); use findActualString to locate the match and assert
the result is not null, that fileContent.includes(result!), and that the
returned substring equals the actual trailing slice of fileContent (so the test
fails if origEnd === -1 mapping truncates/miscomputes the end). Ensure the new
test is named clearly (e.g. "handles ratio-fallback for trailing partial
matches") and placed in the same utils.test.ts block covering tab/space
normalization to exercise mapNormalizedMatchBackToFile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts`:
- Around line 66-74: The JSDoc for normalizeWhitespace is inaccurate: the
function currently performs a global tab expansion, not leading-whitespace
collapsing. Update the comment above normalizeWhitespace to state that it
converts all tab characters to four spaces (global replacement) for consistent
fuzzy matching between Read tool output and file contents, and remove the
mention of collapsing leading whitespace so it matches the implementation.
- Around line 143-193: mapNormalizedMatchBackToFile can under-estimate origEnd
when the loop exits because origPos reached fileContent.length (match at EOF),
causing the ratio fallback to produce a truncated substring; fix by detecting
when the loop terminated due to reaching the end of file (origPos >=
fileContent.length) and, in that case, set origEnd = fileContent.length (or
clamp origEnd to fileContent.length) before applying the ratio fallback so
matches that extend to EOF return the full tail; update the logic inside
mapNormalizedMatchBackToFile to explicitly snap/clip origEnd to
fileContent.length when the file was exhausted.

---

Outside diff comments:
In `@src/components/Messages.tsx`:
- Around line 819-847: The prop shouldCollapseDiffs is changing but not included
in the memo comparators, so add it to the equality check (or document intent).
Update areMessageRowPropsEqual (in MessageRow.tsx) to compare
prev.shouldCollapseDiffs !== next.shouldCollapseDiffs and return false when it
differs (especially for rows with tool_result content), or if the desired
behavior is to only apply collapse on first render, add a clear comment near the
shouldCollapseDiffs calculation in Messages.tsx and keep comparators unchanged;
ensure areMessagePropsEqual (Message.tsx) is also updated if Message consumes
shouldCollapseDiffs directly.

---

Nitpick comments:
In `@packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts`:
- Around line 109-187: Add a unit test that reproduces the ratio-fallback
edge-case in mapNormalizedMatchBackToFile by using a fileContent with highly
non-uniform tab density (e.g. many leading tabs then text) and a search string
with spaces for the same trailing text (example: fileContent =
"\t\t\t\t\t\t\t\t\t\there" and searchSpaces = "    here"); use findActualString
to locate the match and assert the result is not null, that
fileContent.includes(result!), and that the returned substring equals the actual
trailing slice of fileContent (so the test fails if origEnd === -1 mapping
truncates/miscomputes the end). Ensure the new test is named clearly (e.g.
"handles ratio-fallback for trailing partial matches") and placed in the same
utils.test.ts block covering tab/space normalization to exercise
mapNormalizedMatchBackToFile.

In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts`:
- Around line 121-128: The combined-branch is redundantly calling
normalizeWhitespace(normalizedFile) and normalizeWhitespace(normalizedSearch)
where normalizedFile/normalizedSearch were produced earlier; instead reuse the
already-whitespace-normalized values (wsNormalizedFile, wsNormalizedSearch) and
apply quote normalization once: replace combinedFile =
normalizeWhitespace(normalizedFile) with combinedFile =
normalizeQuotes(wsNormalizedFile) and combinedSearch =
normalizeQuotes(wsNormalizedSearch), keeping the subsequent combinedIndex and
mapNormalizedMatchBackToFile usage unchanged.

In `@src/components/Messages.tsx`:
- Around line 817-822: Hoist the DIFF_COLLAPSE_DISTANCE constant out of
renderMessageRow to module scope next to MAX_MESSAGES_WITHOUT_VIRTUALIZATION and
MESSAGE_CAP_STEP so it isn't re-created per row and is easier to tune; update
the inline comment where DIFF_COLLAPSE_DISTANCE is used to accurately describe
behavior (e.g. "Collapse diffs for messages beyond the latest N messages; when
N=0 only the most recent message remains expanded") and keep the existing usage
of shouldCollapseDiffs = renderableMessages.length - 1 - index >
DIFF_COLLAPSE_DISTANCE unchanged so the logic still references the hoisted
constant.

In `@src/services/api/claude.ts`:
- Around line 2554-2556: Extract the duplicated conditional spread for
thinkingBudgetTokens into a small helper object (e.g., create
thinkingAnalyticsFields) so you only compute it once from thinkingConfig: set
thinkingAnalyticsFields = thinkingConfig.type === 'enabled' ? {
thinkingBudgetTokens: thinkingConfig.budgetTokens } : {} and then replace the
three occurrences of ...(thinkingConfig.type === 'enabled' && {
thinkingBudgetTokens: thinkingConfig.budgetTokens }) (the disabled-fallback,
normal-fallback and 404-fallback sites) with ...thinkingAnalyticsFields; keep
the symbol names thinkingConfig and thinkingBudgetTokens unchanged so the
refactor is minimal and local.
- Around line 1782-1793: langfuseThinking is set directly from
queryParams.thinking which uses snake_case (budget_tokens), but other providers
normalize to camelCase; update the block that assigns langfuseThinking (in the
function that calls paramsFromContext and uses thinkingConfig) to map
thinking.budget_tokens to thinking.budgetTokens (i.e., produce { type,
budgetTokens } or equivalent) before passing into recordLLMObservation so the
Claude path matches the camelCase shape used by gemini/index.ts, sideQuery.ts
and tokenEstimation.ts and avoids relying on dual-format acceptance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c26788a-29fc-439c-b7e5-9451565e1b0b

📥 Commits

Reviewing files that changed from the base of the PR and between b47731a and 4cbef96.

📒 Files selected for processing (15)
  • packages/builtin-tools/src/tools/FileEditTool/__tests__/utils.test.ts
  • packages/builtin-tools/src/tools/FileEditTool/utils.ts
  • src/components/Message.tsx
  • src/components/MessageRow.tsx
  • src/components/Messages.tsx
  • src/components/messages/UserToolResultMessage/UserToolResultMessage.tsx
  • src/components/messages/UserToolResultMessage/UserToolSuccessMessage.tsx
  • src/main.tsx
  • src/services/api/claude.ts
  • src/services/api/gemini/index.ts
  • src/services/api/logging.ts
  • src/services/api/openai/index.ts
  • src/services/langfuse/tracing.ts
  • src/services/tokenEstimation.ts
  • src/utils/sideQuery.ts

Comment on lines +66 to +74
/**
* Normalizes whitespace for fuzzy matching by converting tabs to spaces
* and collapsing leading whitespace on each line to a canonical form.
* This handles the case where Read tool output renders tabs as spaces,
* so users copy spaces from the output but the file actually has tabs.
*/
function normalizeWhitespace(str: string): string {
return str.replace(/\t/g, ' ')
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

JSDoc doesn't match the implementation.

The doc says it "collapses leading whitespace on each line to a canonical form", but str.replace(/\t/g, ' ') replaces every tab in the string regardless of position (interior tabs included). Either tighten the implementation to leading-only whitespace or update the comment to describe what actually happens (global tab→4-space expansion). Note the global behavior is also what the rest of the cascade relies on, so the comment is the simpler fix.

📝 Suggested doc tweak
 /**
- * Normalizes whitespace for fuzzy matching by converting tabs to spaces
- * and collapsing leading whitespace on each line to a canonical form.
- * This handles the case where Read tool output renders tabs as spaces,
- * so users copy spaces from the output but the file actually has tabs.
+ * Normalizes whitespace for fuzzy matching by globally expanding every
+ * tab character to 4 spaces. This handles the case where Read tool output
+ * renders tabs as spaces, so users copy spaces from the output but the
+ * file actually has tabs. Tab width is fixed at 4 — files using a
+ * different tab width will not match via this fallback.
  */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Normalizes whitespace for fuzzy matching by converting tabs to spaces
* and collapsing leading whitespace on each line to a canonical form.
* This handles the case where Read tool output renders tabs as spaces,
* so users copy spaces from the output but the file actually has tabs.
*/
function normalizeWhitespace(str: string): string {
return str.replace(/\t/g, ' ')
}
/**
* Normalizes whitespace for fuzzy matching by globally expanding every
* tab character to 4 spaces. This handles the case where Read tool output
* renders tabs as spaces, so users copy spaces from the output but the
* file actually has tabs. Tab width is fixed at 4 files using a
* different tab width will not match via this fallback.
*/
function normalizeWhitespace(str: string): string {
return str.replace(/\t/g, ' ')
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 66 - 74,
The JSDoc for normalizeWhitespace is inaccurate: the function currently performs
a global tab expansion, not leading-whitespace collapsing. Update the comment
above normalizeWhitespace to state that it converts all tab characters to four
spaces (global replacement) for consistent fuzzy matching between Read tool
output and file contents, and remove the mention of collapsing leading
whitespace so it matches the implementation.

Comment on lines +143 to +193
function mapNormalizedMatchBackToFile(
fileContent: string,
normalizedFile: string,
normalizedStart: number,
normalizedLength: number,
): string {
// Build a sparse mapping from normalized position → original position.
// We only need to map the range [normalizedStart, normalizedStart + normalizedLength].
let normPos = 0
let origPos = 0
let origStart = -1
let origEnd = -1

while (origPos < fileContent.length && normPos <= normalizedStart + normalizedLength) {
if (normPos === normalizedStart) {
origStart = origPos
}
if (normPos === normalizedStart + normalizedLength) {
origEnd = origPos
break
}

const origChar = fileContent[origPos]!
if (origChar === '\t') {
// Tab expands to 4 spaces in normalized version
const nextNormPos = normPos + 4
// If normalizedStart falls within this expanded tab, snap to origPos
if (normPos < normalizedStart && nextNormPos > normalizedStart && origStart === -1) {
origStart = origPos
}
if (normPos < normalizedStart + normalizedLength && nextNormPos > normalizedStart + normalizedLength && origEnd === -1) {
origEnd = origPos + 1
}
normPos = nextNormPos
origPos++
} else {
normPos++
origPos++
}
}

// Fallback: if we couldn't map precisely, use character-count heuristic
if (origStart === -1) origStart = 0
if (origEnd === -1) {
// Approximate: use the ratio of original to normalized length
const ratio = fileContent.length / normalizedFile.length
origEnd = Math.round(origStart + normalizedLength * ratio)
}

return fileContent.substring(origStart, origEnd)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

mapNormalizedMatchBackToFile can return a truncated substring when the match extends to EOF.

The loop terminates as soon as origPos >= fileContent.length, which occurs naturally when the match ends at the end of fileContent. In that case origEnd stays -1 and the ratio fallback (fileContent.length / normalizedFile.length) is used — but that ratio is only valid if tab density is uniform across the whole file. With non-uniform tabs, the fallback under-estimates origEnd and silently returns a shorter substring than the actual match.

Counter-example:

  • fileContent = "\t\t\t\t\t\t\t\t\t\there" (10 leading tabs, length 14)
  • normalizedFile.length = 44
  • search = " here" → match at normalized index 36, length 8 → origStart = 9
  • ratio = 14/44 ≈ 0.318 → origEnd = round(9 + 8 * 0.318) = 12
  • returned: "\the" instead of "\there"

Because the callers in FileEditTool.ts (lines 303-315 and 458-467) and UI.tsx (lines 296-304) feed this result directly into file.split(actualOldString) for occurrence counting and into getPatchForEdit without any validation, a truncated return can silently miscount matches and produce the wrong replacement.

🔧 Proposed fix — explicitly snap origEnd to end-of-file when the loop exhausted fileContent
   while (origPos < fileContent.length && normPos <= normalizedStart + normalizedLength) {
     // ... existing body ...
   }

-  // Fallback: if we couldn't map precisely, use character-count heuristic
-  if (origStart === -1) origStart = 0
-  if (origEnd === -1) {
-    // Approximate: use the ratio of original to normalized length
-    const ratio = fileContent.length / normalizedFile.length
-    origEnd = Math.round(origStart + normalizedLength * ratio)
-  }
+  // If the loop exhausted fileContent while the match still ran to (or past) its end,
+  // the match extends to EOF and origEnd should be the end of fileContent.
+  if (origEnd === -1 && normPos >= normalizedStart + normalizedLength) {
+    origEnd = fileContent.length
+  }
+  // Last-resort fallback (should not normally be reached given the cascade above).
+  if (origStart === -1) origStart = 0
+  if (origEnd === -1) origEnd = fileContent.length
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/builtin-tools/src/tools/FileEditTool/utils.ts` around lines 143 -
193, mapNormalizedMatchBackToFile can under-estimate origEnd when the loop exits
because origPos reached fileContent.length (match at EOF), causing the ratio
fallback to produce a truncated substring; fix by detecting when the loop
terminated due to reaching the end of file (origPos >= fileContent.length) and,
in that case, set origEnd = fileContent.length (or clamp origEnd to
fileContent.length) before applying the ratio fallback so matches that extend to
EOF return the full tail; update the logic inside mapNormalizedMatchBackToFile
to explicitly snap/clip origEnd to fileContent.length when the file was
exhausted.

@claude-code-best claude-code-best merged commit c80e593 into main Apr 27, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants